Goto

Collaborating Authors

 conformational landscape


Appendices and Supplementary Material

Neural Information Processing Systems

A.1 Equations for Conformational Energy Landscape Overlap Analysis To quantify the similarity between the protein conformations generated by AI-based models and those in the ProteinConformers dataset, the following three commonly used overlap metrics are employed: Interaction overlap, coverage, and the Jaccard index. These metrics evaluate the extent of agreement in low-energy regions between the protein conformers from different models of the same protein, based on a specified energy threshold. Let A = {Ai,j} and B = {Bi,j} where i,j [0,N], denote the two-dimensional free energy landscapes corresponding of two conformational ensembles. Each element Ai,j and Bi,j represents the free energy value at a specific grid point in the conformational energy landscape. For a given energy threshold ฯ„ (e.g., 40 kJ/mol), the number of shared low-energy conformations is defined as: |A B| = Figure 6: Comparison of conformational landscapes for protein T1030, generated by ProteinConformers and protein conformation generative models.


AI-based Methods for Simulating, Sampling, and Predicting Protein Ensembles

arXiv.org Artificial Intelligence

Advances in deep learning have opened an era of abundant and accurate predicted protein structures; however, similar progress in protein ensembles has remained elusive. This review highlights several recent research directions towards AI-based predictions of protein ensembles, including coarse-grained force fields, generative models, multiple sequence alignment perturbation methods, and modeling of ensemble descriptors. An emphasis is placed on realistic assessments of the technological maturity of current methods, the strengths and weaknesses of broad families of techniques, and promising machine learning frameworks at an early stage of development. We advocate for "closing the loop" between model training, simulation, and inference to overcome challenges in training data availability and to enable the next generation of models.


MD-LLM-1: A Large Language Model for Molecular Dynamics

arXiv.org Artificial Intelligence

Molecular dynamics (MD) is a powerful approach for modelling molecular systems, but it remains computationally intensive on spatial and time scales of many macromolecular systems of biological interest. To explore the opportunities offered by deep learning to address this problem, we introduce a Molecular Dynamics Large Language Model (MD-LLM) framework to illustrate how LLMs can be leveraged to learn protein dynamics and discover states not seen in training. By applying MD-LLM-1, the first implementation of this approach, obtained by fine-tuning Mistral 7B, to the T4 lysozyme and Mad2 protein systems, we show that training on one conformational state enables the prediction of other conformational states. These results indicate that MD-LLM-1 can learn the principles for the exploration of the conformational landscapes of proteins, although it is not yet modeling explicitly their thermodynamics and kinetics.


MoDyGAN: Combining Molecular Dynamics With GANs to Investigate Protein Conformational Space

arXiv.org Artificial Intelligence

Extensively exploring protein conformational landscapes remains a major challenge in computational biology due to the high computational cost involved in dynamic physics-based simulations. In this work, we propose a novel pipeline, MoDyGAN, that leverages molecular dynamics (MD) simulations and generative adversarial networks (GANs) to explore protein conformational spaces. MoDyGAN contains a generator that maps Gaussian distributions into MD-derived protein trajectories, and a refinement module that combines ensemble learning with a dual-discriminator to further improve the plausibility of generated conformations. Central to our approach is an innovative representation technique that reversibly transforms 3D protein structures into 2D matrices, enabling the use of advanced image-based GAN architectures. We use three rigid proteins to demonstrate that MoDyGAN can generate plausible new conformations. We also use deca-alanine as a case study to show that interpolations within the latent space closely align with trajectories obtained from steered molecular dynamics (SMD) simulations. Our results suggest that representing proteins as image-like data unlocks new possibilities for applying advanced deep learning techniques to biomolecular simulation, leading to an efficient sampling of conformational states. Additionally, the proposed framework holds strong potential for extension to other complex 3D structures.


Artificial intelligence techniques for integrative structural biology of intrinsically disordered proteins

arXiv.org Artificial Intelligence

We outline recent developments in artificial intelligence (AI) and machine learning (ML) techniques for integrative structural biology of intrinsically disordered proteins (IDP) ensembles. IDPs challenge the traditional protein structure-function paradigm by adapting their conformations in response to specific binding partners leading them to mediate diverse, and often complex cellular functions such as biological signaling, self organization and compartmentalization. Obtaining mechanistic insights into their function can therefore be challenging for traditional structural determination techniques. Often, scientists have to rely on piecemeal evidence drawn from diverse experimental techniques to characterize their functional mechanisms. Multiscale simulations can help bridge critical knowledge gaps about IDP structure function relationships - however, these techniques also face challenges in resolving emergent phenomena within IDP conformational ensembles. We posit that scalable statistical inference techniques can effectively integrate information gleaned from multiple experimental techniques as well as from simulations, thus providing access to atomistic details of these emergent phenomena.